Reinforcement Learning in Large Population Models: A Continuity Equation Approach∗
نویسندگان
چکیده
We study an evolutionary model in which strategy revision protocols are based on agent specific characteristics rather than wider social characteristics. We assume that agents are primed to play mixed strategies. At any time, the distribution of mixed strategies over agents in a population is described by a probability measure. In each round, a pair of randomly chosen agents play a game, after which they update their mixed strategies using certain reinforcement driven rules based on payoff information. The distribution over mixed strategies thus changes. In a continuous-time limit, this change is described by non-linear continuity equations. We provide a general solution to these equations, which we use to analyze some simple evolutionary scenarios: negative definite symmetric games, doubly symmetric games, generic 2×2 symmetric games, and 2 × 2 asymmetric games. A key finding is that, when agents carry mixed strategies, distributional considerations cannot be subsumed under a classical approach such as the deterministic replicator dynamics.
منابع مشابه
Reinforcement Learning in Evolutionary Games∗
We study an evolutionary model in which strategy revision protocols are based on agent specific characteristics rather than wider social characteristics. We assume that agents are primed to play a mixed strategy, with the weights on each pure strategy modifiable on the basis of experience. At any time, the distribution of mixed strategies over agents in a large population is described by a prob...
متن کاملAdaptive-Resolution Reinforcement Learning with Efficient Exploration in Deterministic Domains∗
We propose a model-based learning algorithm, the Adaptive-resolution Reinforcement Learning (ARL) algorithm, that aims to solve the online, continuous state space reinforcement learning problem in a deterministic domain. Our goal is to combine adaptive-resolution approximation scheme with efficient exploration in order to obtain fast (polynomial) learning rates. The proposed algorithm uses an a...
متن کاملMulticast Routing in Wireless Sensor Networks: A Distributed Reinforcement Learning Approach
Wireless Sensor Networks (WSNs) are consist of independent distributed sensors with storing, processing, sensing and communication capabilities to monitor physical or environmental conditions. There are number of challenges in WSNs because of limitation of battery power, communications, computation and storage space. In the recent years, computational intelligence approaches such as evolutionar...
متن کاملOnline Regret Bounds for Undiscounted Continuous Reinforcement Learning
We derive sublinear regret bounds for undiscounted reinforcement learning in continuous state space. The proposed algorithm combines state aggregation with the use of upper confidence bounds for implementing optimism in the face of uncertainty. Beside the existence of an optimal policy which satisfies the Poisson equation, the only assumptions made are Hölder continuity of rewards and transitio...
متن کاملReduced-Order Models for Data-Limited Reinforcement Learning
Often in reinforcement learning we are confronted with planning in a domain with an unknown dynamics model. In poorly understood domains where an exact model of either the dynamics, value function, or policy is unavailable we commonly choose to use large models, consisting of many parameters. Unfortunately, these large models can require a prohibitively large amount of training in realworld dom...
متن کامل